DBSC: A Dependency-Based Subspace Clustering Algorithm for High Dimensional Numerical Datasets
نویسندگان
چکیده
We present a novel algorithm called DBSC, which finds subspace clusters in numerical datasets based on the concept of “dependency”. This algorithm uses a depth-first search strategy to find out the maximal subspaces: a new dimension is added to current k-subspace and its validity as a (k 1)-subspace is evaluated. The clusters within those maximal subspaces are mined in a similar fashion as maximal subspace mining does. With the experiments on synthetic and real datasets, our algorithm is shown to be both e ective and eÆcient for high dimensional datasets.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملHolo-Entropy Based Categorical Data Hierarchical Clustering
Clustering high-dimensional data is a challenging task in data mining, and clustering high-dimensional categorical data is even more challenging because it is more difficult to measure the similarity between categorical objects. Most algorithms assume feature independence when computing similarity between data objects, or make use of computationally demanding techniques such as PCA for numerica...
متن کاملA Robust k-Means Type Algorithm for Soft Subspace Clustering and Its Application to Text Clustering
Soft subspace clustering are effective clustering techniques for high dimensional datasets. Although several soft subspace clustering algorithms have been developed in recently years, its robustness should be further improved. In this work, a novel soft subspace clustering algorithm RSSKM are proposed. It is based on the incorporation of the alternative distance metric into the framework of kme...
متن کاملExploring Constraints Inconsistence for Value Decomposition and Dimension Selection Using Subspace Clustering
The datasets which are in the form of object-attribute-time is referred to as threedimensional (3D) data sets. As there are many timestamps in 3D datasets, it is very difficult to cluster. So a subspace clustering method is applied to cluster 3D data sets. Existing algorithms are inadequate to solve this clustering problem. Most of them are not actionable (ability to suggest profitable or benef...
متن کاملTemporal Subspace Clustering for Unsupervised Action Segmentation
Action segmentation (segmenting a continuous sequence of motion data into a set of actions) has a wide range of applications and plays a role in many problems in computer vision. We look at subspace clustering as an unsupervised approach for this task. Classical subspace clustering methods uncover relationships within the data by learning codes for the samples (i.e. frames), but in this process...
متن کامل